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ABSTRACT 

Precise recommendation of followers helps in improving 
the user experience and maintaining the prosperity of twitter 
and microblog platforms. In this paper, we design a hybrid 
recommender system of microblog as a solution of KDD Cup 
2012, track 1 task, which requires predicting users a user 
might follow in Tencent Microblog. We describe the back- 
ground of the problem and present the algorithm consist- 
ing of keyword analysis, user taxonomy, (potential)interests 
extraction and item recommendation. Experimental result 
shows the high performance of our algorithm. Some possible 
improvements are discussed, which leads to further study. 

General Terms 

Algorithm, Machine Learning, Data Mining 

Keywords 
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1. INTRODUCTION 

Online social networking services like Twitter have been 
tremendously popular, with a considerable speed of user 
growth. Thousands of new registrations are observed every- 
day in dominant platforms like Sina and Tencent Microblog 
since the introduction of microblog - Chinese twitter - in 
2007. Celebrities and organizations also register microblog, 
which leads to diversity of topics and helps attract more po- 
tential users. However, flooded information can puzzle the 
users and even result in the loss of them. So reducing the 
risk of puzzlement and recommending attractive items - spe- 
cific users selected for recommendation - are crucial for user 
experience improvement and prosperity maintenance, which 
present opportunities for novel machine learning and data 
mining approaches. 

*To whom correspondence should be addressed 



Recommender systems can be categorized into content- 
based algorithm [7], collaborative filtering [5], and influen- 
tial ranking algorithm [l2]. Unfortunately, all of them con- 
sider little of user profile's fidelity, preference variance and 
interactions, causing difficulty of precise and stable recom- 
mendation. To overcome these weaknesses of single method, 
we construct a hybrid recommender system specified to Ten- 
cent Microblog, which generates ordered item list by mining 
the data of the platform [8]. 

The rest of the paper is organized as follows. Section 2 
will discuss the background of the problem, and Section 3 
will describe the design of the hybrid recommender system, 
including keyword analysis, user taxonomy, preference ex- 
traction, discovery of potential interests and generation of 
ordered recommendation. Section 4 will present the train- 
ing process. Section 5 will show the experimental results and 
discuss some improvements, and the paper will be concluded 
in Section 6. 

2. BACKGROUND 

Observing the popularity of twitter services, Tencent, one 
of China's leading Internet service portal, launched its mi- 
croblog platform - Tencent Microblog - in 2010. It has at- 
tracted a lot of registered users(425 million registered ac- 
counts and 67 million daily active users in season 1, 2012 
[11| ) and became one of the dominant microblog platforms in 
China based on the large user group of its instant messaging 
service QQ(711.7 million [loj on Sep 30, 2011). Celebrities 
and organizations - items carefully categorized into hierar- 
chies - are invited to register the platform, leading to a nice 
growth in the user group. Furthermore, Tencent's microblog 
service is embedded in its other leading platforms like Shu- 
oshuo (signature of user's QQ account), Qzone(blog plat- 
form), Pengyou.com(SNS service) and Weixin(mobile mes- 
senger) , hence user can write or comment a message directly 
on the website of Tencent Microblog or via the third-party 
port and related platforms. 

While Tencent has the largest microblog user group, Sina 
Microblog takes a commanding lead with 56.5% of China's 
microblog market based on active users and 86.6% based on 
browsing time over its competitors [§]. This fake prosperity 
of Tencent Microblog, which is far from the public percep- 
tion, results from the existence of the fake users(explained 
in section 3.2), widely used spammer strategy [5] and the 
weird definition of active users. Tencent Microblog consid- 
ers those who frequently write(retweet or comment) or read 



microblog messages - no matter on the website or other as- 
sociated platforms - as active users, while Twitter and Sina 
Microblog define them as those who login the platform ev- 
eryday. User messages generated via from other related plat- 
forms confuse the recommender finding the real interests of 
users, which leads to the decrease of acceptance. Tencent 
Microblog users accept the recommendations in a low per- 
centage(less than 9% according to our survey [8]), and the 
recommended item lists isn't updated in time, which deviate 
from the users' present preferences. 

3. ALGORITHMS 

As shown in the prior study, each of the existing algo- 
rithms has its unavoidable disabilities. This section intro- 
duces a hybrid recommender system to overcome them, in- 
cluding the preparations - keyword analysis and user taxon- 
omy - and the main part of it. 
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and eliminates those which fail to satisfy local minimums 
suppdocal or conf -local. Then remote site sent the re- 
maining candidate transactions C 3 a to the polling site VCk, 
where K — polling(Cj,) is a hash function. 

VCk gathers the candidate transactions C J a , computes 
global support supp_global(C 3 t ) and confidence conf_global(C 3 t ) 
by sending request to remote sites for local values' return: 

supp_global{C 3 a ) = average(min A p (k 3 lk )) 

(ki lk eCi,u p eVB), 



3.1 Keyword Analysis 

Mining synonyms in user's keywords helps in finding their 
interests. However, applying association rule algorithm [2] 
to find them directly in the huge keyword set is unrealis- 
tic since that involves searching all possible combinations. 
So we parallel this process by adopting revised FDM(Fast 
Distributed Mining of association rules) [4], based on the 
downward-closure property of support which guarantees that 
the necessity and sufficient condition for a frequent itemset 
is the frequency of all its subsets [9]. Moreover, we insert 
the ambiguous keywords ('apple' for instance) into different 
classes simultaneously, which cause the inconsistence of class 
size. 
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Then VCk filters out the candidates which fail to satisfy the 
constraint of supp_global and conf_global. Generally the 
local minimums coincide with the globals. For convenience 
we still denote the updated candidate sets as C\ . 

Then home site gathers C 3 from the polling sites to gen- 
erate the result of transactions(keyword classes) in the j th 
iteration: 



T 3 ={jc 3 



Let U = {u\,U2, ...,u m } be the set of users where ui has 
the value of its ID. Each user Uj has its keyword set Kj — 
{kji,kj2,—,kj nj } with weights Wj = {w jl , Wj 2 , — , uij nj }, 
and we denote K, = \JT=i Kj as the set of all users' keywords. 
The database (user-keyword set) is divided into n subsets 
T>Bi and these subsets are broadcasted to the remote sites 
UMi. T J is the result generated in the j th iteration: 

V !/; ./;; /,: }. 

where k 3 p are the keywords of the i th keyword transaction 
in the j th iteration. Apriori algorithm is applied to generate 
the candidate transaction set C 3 at TLMi in the beginning 
of the j th iteration as follows: 

C\ = Apriori_gen(T^~ l ), 



where 



£f — {Ciu Cj2> ■■■■> c 3 ih }, 



rij _ r ui uj u \ 

Let A = {Ai,A2, ...,A m } be a set of mappings where 
A, : K -> U {0}, 



Aj(ki) — 



Wji, h — kji G Wj 



0, 



where 

supp_global(Tl) >= supp_global, 

conf_global(Tl) >= conf_global. 

The process is terminated if no new transaction is generated, 
before termination the home site broadcasts the transactions 
Tl to the remote site IZMk where IZMk is the original 
remote site of T? = C 3 Km , and then starts next iteration. 

The final result of keyword class is 

keyword_class = {classi, class2, ciassjv}, 

where 

clcLSSi — {/Cjl, ki2 , kijn\ 

is the set of synonyms. 

The choice of minimums affects the precision and compu- 
tational complexity tremendously. We sampled 1000 users' 
keywords and found out that these users have their keyword 
weights average in 0.14, so we assign 

suppJocal = supp_global = 0.2, 

a little higher than the average weight. conf_local / con f_global 
is affected by suppAocal j supp_global, in this case 

0.14 
~02~ 



conf Aocal = conf_global 



0.7. 



IZMi computes the local support and confidence of C 3 U by 



suppJ,ocal(C 3 a ) = average(min A p (k 3 lk )) 



3.2 User Taxonomy 

Most of the microblog platforms divide their users into 
2 groups - active and inactive - to apply different types of 
strategies. However, some users don't login Tencent Mi- 
croblog directly while they have records of tweets gener- 
ated from other related platforms, and these messages could 
hardly reflect their interests. Furthermore, they rarely in- 
teract with other users and have few favorites for the same 
reason. So we classify them as fake users(see Figure [TJ. In 
addition, we also consider the spammers as fake since they 
seldom use microblog even indirectly. 
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Figure 1: User taxonomy. Active users have more 
followees than inactive users. Links related to fake 
users are eliminated since they're not real users in 
the user network. 



to the sparsity of successful recommendation records, which 
reflect the user's interests directly. 

3.3 Generating Recommendations 

After keyword analysis and user taxonomy which are prepa- 
rations of the recommendation, it comes the main part of our 
hybrid recommender system, consisting of item popularity 
ranking, (potential) interests discovery and the grading func- 
tion, to generate recommended items and evaluate the pos- 
sibility of acceptance or rejection. The system maps the 
users' (potential)interests to their corresponding item cat- 
egories and grades selected candidates in these categories 
with indicators of similarity and popularity. It also con- 
tains special algorithms with respect to fake users in order 
to reach a precise recommendation. 

3.3.1 Item Popularity Ranking 

An item is a specific user, which can be a famous person, 
an organization, or a group. Items are organized in different 
categories of professional domains by Tencent to form a hier- 
archy(see Figure For example, an item, Dr. Kaifu LEE, 
is represented as science-and-technology.intemet.mobile [TJ. 
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Due to the absence of login records, the activeness func- 
tion act(uj) counts the number of tweets and interactions 
and computes Uj's activeness by applying the thresholds 
miri-activeness and miri-action: 

act(uj) = tweet x is_fake(uj), 

where 
is_fake(uj) 

1 + sgn(at + retweet + comment — min_action) 
2 

and 

(active, act(uj) > min_activeness 
inactive, < act(uj) < min_activeness . 
fake, act(uj) = 

We assign min_activeness = 100 and minjuction — 20 
since only 33.2% of the users have written more than 100 
tweets(771599 in 2320895), and apply the algorithm to di- 
vide the user group into 3 classes. 

An appropriate user taxonomy helps in improving the pre- 
cision of recommendation. Users with similar favorites of- 
ten accept similar items, hence dividing users into smaller 
groups by their interests can balance the precision and com- 
putational complexity. However, we haven't done this due 



Figure 2: Item categories organized in hierarchy. 
The pointed item belongs to the category a.b.d.f. 



The number of an item's followers indicates its popularity 
directly. Recommending hot items in a user's interested field 
promotes the possibility of acceptance effectively. For users 
who show little of their preferences(especially the fake users) 
we recommend the most popular items in the whole itemset. 

Let / = {ii,i2, be the item set and hk be the cat- 

egory(hierarchy) of an item ikihk may coincide in some hj, 
and the set of hk is denoted as H). Then the rank of ik in 
hk is computed by 

hotk = get-hot Jrank{ik,hk) 

where the function counts the number of ik's followers and 
return its ranking in hk- Similarly the hot rank of ik in the 
whole item set I is 

HOT k = GET -HOT -RANK (ik)- 

To apply the rank information in our grading process, we 
normalize the rank by function 



and for convenience we still denote the normalized results as 
hotk and HOT h . 




(here Uj can be substituted by ik), where 

Wji, classi = class ji £ key_class(uj) 



weighti 



0, 



classi £ key_class(uj) 



The similarity between ifc and itj is the normalized Euclid 
distance of these 2 vectors: 

sim(uj,ik) = n(\class_weight(uj) — class_weight(i k )\) 

where n(x) is the normalization function mentioned in Sec- 
tion 3.3.1. For the sparsity of the non-zero values we don't 
compute these vectors directly, reducing the storage and 
computational complexity. 
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Figure 3: User network with indirect liiiks(depth = 
2). Indirectly linked users interact with each other 
even without followships. The length of the arrows 
represents the familiarity between two users (a user 
may be more familiar with some indirectly linked 
users than its followees). 

3.3.2 Mining Interests from Keywords 

Users are inclined to accept items of their interests. Ac- 
tive users have more keywords which reflect their favorites, 
and we map these interests to the hierarchy H to obtain 
candidate items. 

Consider the keyword classes set 

keyword_class — {classi, class2, classy} 
generated by the keyword analysis. A mapping 
ICH : H — ¥ power (keyword_class) , 

fCH(h k ) = {class kl ,class k2 , ...,class knk } 

is denned to construct the keyword class of a given cate- 
gory h k (see section 3.3.4 for details). Suppose a given user 
Mj(or a given item i k ) has keywords Kj = {ki, k 2 , k nj } 
with weights{ wi, w$, w nj . } which satisfy Wi — 1- A 

function 

key_class(v,j) = {class ji, class j 2 , ■ class jmj} 

computes the keyword class of a given user Uj(o\ item i k ) 
where class ji satisfies 

Kj n classji 7^ 0, 

and the corresponding weight of class ji is 

Wji = Yl wi - 

ki £Kj Dclass 

After keyword analysis of individuals the target categories 
{h k } is generated where h k satisfies 

ICH(hk) PI key-dass(uj) ^ 0, 

hence the candidate items are the items in each h k (suppose 
i k included). A vector function is defined on U Ul as follows: 

class-weight(v,j) = (weighti, weight2, weight m) 



3.3.3 Discovering Potential Interests 

Few inactive users have enough keywords, hence we design 
indirect collaborative filter to mine their potential interests 
from their followees. We build up the social network of a 
user Uj and search for its potential interests in its followees 
followee(uj) = {u k } and even in its followees' followees(see 
Figure [3}. 

Let depth be the maximal levels amount of the searching 
process, in fact depth < 3 is enough for the process to mine 
a user's potential interests. The related users of Uj is 

related_users(uj) = sear ch_ f olloweeiuj , depth) 

where sear ch_followee(uj, depth) returns the followees and 
indirect followees of Uj with max level depth in the social net- 
work. Then for every u k in related_users(uj) we compute 
the keyword classes as mentioned above and merge them 
into the set 

potential _key(uj) = {class ji, class j 2 , ...,classj n .} 

= M key_class(u k )i 

Ufo £related_users(uj ) 

where the i th keyword class classji has the weight 



Wji 



E 



W k i k fami(uj,u k ) 



u k £related_users(uj ) , 
classi.: —class.:; 



fami(uj , u k ) computes the familiarity of Uj and u k by adopt- 
ing indicators of interactions(at(@), retweet and comment) 
which could only happen in linked users: 

fami(uj, u k ) = oji / (at) + io 2 j '(retweet) + u^f (comment), 

where un satisfying X^=i = 1, > is obtained in train- 
') is : 

/(*) 



ing process and f(x) is a sigmoid function 

2 



1 + e- 



1. 



Finally we merge key_class(uj) and potential_key(uj) into 
the set 

interests(uj) = {class ji, class j 2 , classj nj } 
with weight of classji 



Wj h , 
Wji = {Wj h , 



class ji £ key-dass(uj) 
class ji G potentialJkey(uj) 



'>(Wji 1 + Wji 2 ), classji in both sets 



where 



class ji — class £ key_class(uj), 



classji = classji 2 £ potentialJkey(uj). 

Correspondingly the target category {hk} satisfies 

K.H(hk) n inter est s(iij) / 0, 

and the candidate items are in each hk as mentioned be- 
fore(suppose ik included). We modify the vector function 
class _weight(uj) which is defined in section 3.3.2 by using 
interests(uj) to substitute key-dass(uj), i.e. Wji instead 
of Wji, and compute the similarity of Uj and ik as before. 
In this way, we get the similarity between items and inactive 
users. The algorithm can also be applied to the recommen- 
dation for the active users with smaller value of depth. 

3.3.4 Grading Function 

In studies above we presented the extraction of a given 
user Uj's (potential)interests and the generation of the rec- 
ommended candidates. The grading function grade(uj ,ik) 
computes the possibility of acceptance(positive grade) or re- 
jection(negative grade) with indicators of i&'s popularity and 
sim(uj ,ifc) computed as above(see Figure[4|. Then we pick 
out the first k candidates and sort them in descending order 
to generate final recommendation, where in our case k = 3. 

Let / = {ii, 22, in} and H = {hi, /12, h n } be the item 
set and the category (hierarchy) set as previously defined and 
suppose we have extracted keyword classes of each user and 
item. We specify the definition of ICH(see section 3.3.2) as 
follows: 

KM. : H — > power (keyword_class) , 

KrL(h k ) = {class kl ,class k 2, ■■■,classkn k } 

ICH(hk) computes the keyword classes of a given category 
hk with corresponding weight of classkp 

Wk P = average{W ji-), 

ij £ hk, key-dass(ij)(lj) = classji- = class kp . 
Here power(keyword_class) is the power set of keyword_class. 

We revise the definition of class-weight() by extending 
its domain to H: 

class_weight{hk) = (weighti,weight2, ...,weight n ), 



weighti 



Wk P , classi — classkp £ fCH(hk) 
0, classi ^ JCH(hk) 



As previously mentioned we don't compute these vectors 
directly. The function fond() is defined on U x H by 

fond(uj,hk) — g(class_weight(v,j) ■ class_weight(hk), 100), 

, , 2(1 + e~ y ) l + e' y 

q(x,y) — — — r- 

yy ,yj (1 - e~v)(l + e~*v) 1-e-y 

to compute the ratio of it/s fondness for category hk- 
Finally the grading function of active /inactive users is 



where hotk, sim(uj ,ik) are indicators as mentioned above 
and cti,i = 1,2 satisfying ai + «2 = 1, a.% > are ob- 
tained in the training process. Valued in [—1, 1], the grad- 
ing shows the possibility of acceptance(positive grade) or 
rejection(negative grade). Considering the variance of user 
preferences in a certain period, a revised grading function is 
defined as 



revised_grade(iij,ik) = —time(uj,hk)grade(iij,ik), 
A 



where 



time(uj , hk) = 1 + (A — l)e , t £ (— 00, 0]. 



t is the time of Uj's latest acceptance of items in hk(the 
current time is t = and the default time is t = —00 if no 
acceptance). A indicates the proportion of recent interest (in 
our experiment A = 2). 

Fake users, especially those with few keywords, receive dif- 
ferent treatment. Observing the difficulty to apply similarity 
and familiarity function, potential_key(uj) is not generated, 
i.e. interests(uj) — key_class(uj), Wji = Wji. The grad- 
ing function of their recommendations only encompasses the 
hot rank and the preference to compute the grading 

grade(uj,i k ) = (1 + fond(uj,h k ))HOT k — 1. 

This definition emphasizes the popularity of the item in I 
and increases the grading if ik and Uj's interests coincide. 

4. TRAINING 

The choice of parameters Ui and at affects the perfor- 
mance of the hybrid recommender system we introduced 
above. The training process obtains sets of optimal param- 
eters with respect to each user group by applying the su- 
pervised online training algorithm (convergence granted and 
time-saving). Each record is presented once, saving the stor- 
ages and time. Parameters are initialized stochastically, and 
they are updated during the training process until termina- 
tion. However, we omit this training of fake users' grading 
algorithm since its parameters are initialized already. 

Let result(uj,ik) be a record of recommendation in the 
training set where 



ilt(uj,ik) = 



1, recommendation accepted 
— 1, other 



Consider the training error in the n t epoch 

errorin) = result(uj,ik) — grade(uj,ik). 
The gradients (partial derivatives) of err or (n) are 
d 



errorin) = 2fond(uj,hk)(hot k — sim(uj,i k )), 



duji 



du>2 



error(n) — 2aa2fond(uj,h k )^^ ^ 



err or (n) = 2aot2fond(uj,hk) 



E 



■JOJi 



dWji 

8uJ2 



grade(uj,ik) = 2f ond(uj ,h k )(a\hot k + ot2sim(uj ,ik)) — 1, 



keyword analysis _^ KH mapping ^ | 1 s imilarity & ranking^ 



A 



user interest category item 

(keyword class) (hierarchy) 

Figure 4: Recommendation process. Keywords analysis extracts user's interests, ICH mapping match these 
interests to corresponding item categories thus obtains the set of candidates, and the grading function assign 
grades of the recommendation by computing the similarity and popularity. 



where 

a = sgn(clasS-Uieight(uj) — class_weight(ik)), 

W fl =W Jh +W ih . 

Wji ± is the weight of Uj's own keyword class and Wji 2 is 
computed by searching related users(see section 3.3.3). By 

rounding fami(u 3 ,u k ): 
dWji _ dW jh 



duii duii 

~ 1 

~ 2 



E 



W k i k (f(at) - /(comment)), 



Grelated_users(uj ) , 
class f~i ^ —class 



dWji dW. 



8ui2 dbJ2 
_ 1 

W 2 



E 



Wki k (f{retweet) — f (comment)). 



u k £related_users(uj ) , 

classi.i —class^j^ 
Kl k J 2 



After computation of the gradients we update ai and 
uii(i = 1,2) by applying momentum factor j3 6 [0,1] (here 
the typical value j3 = 0.9). The parameter cti in the n th 
epoch «i(n) satisfies 

ctAn + 1) = «i(n) + (1 — fi)^—error(n) + f3Acti(n), 

ooti 

where 



Aai(n) = ai(n) — ai(n — 1). 

u>i, i = 1,2 are updated similarly. The training process is ter- 
minated in the n th epoch if no new training data result(uj,ik) 
or err or (n) < performance, where performance = 0.01 
controls the training process. 

Furthermore, if we apply revised_grade() instead of gradeQ 
in the grading process, the gradients should be revised as 

d 2 
— — error(n) — —time(uj,hk)fond(uj,hk)(hotk~sim(uj,ik)), 
dcti A 



-J^—error(n) — —crc(2time(uj , hk)fond(iij, hu) ^fjjfj 
ouii A f— ' ouii 



-^—error(n) = \craitime.(uj, hh)fond(uj, hk) ^Wil 

OU12 A ^ OU12 



5. EXPERIMENT AND IMPROVEMENT 

This section discusses the training result and evaluation 
metrics we adopted to test the system's performance. Some 
of other approaches are also introduced, which might en- 
hance the performance of the recommender system. 

5.1 Training Result 

In our experiment we sampled 5,938 users' recommenda- 
tion records from the dataset [8] stochastically and divided 
them into 2 subsets for training and testing. We omitted the 
update of uii to reduce the computational complexity and 
assigned uii = ^, which means at(@), retweet and comment 
occupy the same proportion when computing fami(uj, Ufc). 
We trained each user's patterns and computed the average 
parameters. Table [l] presents the results of the training pro- 
cess. 



user class 


user 


followee 


interaction 


keyword 


ai 


active 


3919 


46 


87 


10 


0.33 


inactive 


1194 


27 


42 


8 


0.18 


fake 


825 


18 


2 


5 


/ 



Table 1: Training Sets and Optimal Parameters. 
Fake users' grading function has no parameters to 
update so we omit the training process of it. 

The result shows an evident discrepancy of ai, which re- 
flects the inclination of accepting popular items. Inactive 
users prefer items with similar interests while active users 
prefer items with high popularity. 

5.2 Prediction and Precision Evaluation 

We computed grade(uj ,i^) of all result(v,j , ik) in test- 
ing subset and generated ordered item list of Uj (see section 
3.3.4) to test the trained system. The evaluation metric 
is the average precision [14] which KDD Cup's organizers 
adopted: 

3 

AP@3(uj) = ^2p(i)Ar(i), 

i=i 

where p(i) is the precision of the i th recommended item and 
Ar(i) is the change in the recall from i — 1 to i. Table 
[2] presents the M,4P@3(mean value of AP@3(uj)) results 
and Table [3] presents the recommended item lists and the 
average precision of some users. The precision of fake users' 
prediction is much lower than others' in our experiment due 
to the difficulty of their interests' extractions. Adjusting 
min^action or recommending their linkers on other related 
platforms like QQ might help improve the results. 



active 


inactive 


fake 


total 


0.41066 


0.46879 


0.33606 


0.41198 



Table 2: Prediction Evaluation. Mining potential 
interests from inactive users' followees improves the 
performance of recommendation. Fake users' result 
is not good as the others. 



Uj user class item accepted item AP@3(uj) 
2071402 active 1606902 1606902 083 

1760350 1774452 

1774452 

942226 inactive 1606902 1606902 L00 

1606609 

1774452 

193889 fake 1760642 1774862 033 
1774684 
1774862 



7. ACKNOWLEDGMENTS 

We would like to thank the organizers of KDD Cup 2012 
for organizing such a challenging and exciting competition. 
We would also like to thank Jiuya Wang for helpful discus- 
sions and Dachao Li from High- Performance Computing 
Platform of Sun Yat-Sen University for distributed comput- 
ing supports. 

8. REFERENCES 

[1] K. C. 2012. Predict which users (or information 
sources) one user might follow in tencent weibo. 
http://www.kddcup2012.Org/c/kddcup2012-trackl, 
2012. 

[2] R. Agrawal, T. Imielihski, and A. Swami. Mining 
association rules between sets of items in large 
databases. In Proceedings of the 1993 ACM SIGMOD 
international conference on Management of data, 
SIGMOD '93, 1993. 

[3] Baidu. Zombie fans on weibo. 

http://baike.baidu.com/view/4047998.htm, 2010. 

[4] D. Cheung, J. Han, V. Ng, A. Fu, and Y. Fu. A fast 
distributed algorithm for mining association rules. In 
Parallel and Distributed Information Systems, 1996., 
Fourth International Conference on, dec 1996. 

[5] J. A. Konstan, B. N. Miller, D. Maltz, J. L. Herlocker, 
L. R. Gordon, and J. Riedl. Grouplens: applying 
collaborative filtering to Usenet news. Commun. ACM, 
1997. 

[6] Kyle. Sina commands 56% of china's microblog 
market. 

http: / /www. resonancechina.com /2011/03/30/ sina- 
commands-56-of-chinas-microblog-market / , March 
2011. 

[7] M. McPherson, L. Smith-Lovin, and J. M. Cook. 

Birds of a feather: Homophily in social networks. 

Annual Review of Sociology, 2001. 
[8] Y. Niu, Y. Wang, G. Sun, A. Y. B. Dalessandro, 

C. Perlich, and B. Hamner. The Tencent Dataset and 

KDD-Cup'12. KDD-Cup Workshop, 2012. 
[9] P. Tan, M. Steinbach, and V. Kumar. Introduction to 

Data Mining, chapter 6. Association Analysis: Basic 

Concepts and Algorithms. Addison- Wesley, 2005. 
[10] Tencent. About tencent. http://www.tencent.com/en- 

us/at/ abouttencent .shtml, 

2012. 

[11] Tencent. Tencent announces 2012 first quarter results, 
http : / / www. tencent . com / en- 
us/content /ir/news/2012/attachments/20 120516.pdf, 
May 2012. 

[12] Z. Wang, Y. Tan, and M. Zhang. Graph-based 
recommendation on social networks. In Web 
Conference (APWEB), 2010 12th International 
Asia-Pacific, 2010. 

[13] Wikipedia. Demography. 

http://en.wikipedia.org/wiki/Demography. 

[14] M. Zhu. Recall, precision and average precision. 

Working Paper 2004-09, Department of Statistics & 
Actuarial Science, University of Waterloo, 2004. 



Table 3: Examples of Prediction. User 2071402 ac- 
cepts the 1 st and 3 rd items, then AP@3 = (i + |)/2 = 
§; User 942226 only accepts the 1 st item, then 
AP@3 = \ = 1; User 193889 only accepts the 3 rd 
item, then AP@3 = §. 

5.3 Improvements of the System 

There are approaches to enhance the performance and 
overcome the limitations of our system. Recommendation 
based on demographic methods [l3| can help in enhancing 
the percentage of acceptance. Users with similar demo- 
graphic information may have interest's coincidences thus 
accept similar items. Refined keyword analysis and user 
taxonomy can improve the recommendation. Users who fol- 
low items in the same category or interact with users who 
have explicit preferences can be grouped in identical user 
class. They share synonyms in their keywords and accept 
similar items in a high possibility based on the similarity of 
preferences. Adaption to the frequently updated microblog 
platform's database can get user's present interests. Fortu- 
nately user's interests and behaviors are stable in a short 
period, so the system only needs retrains stochastically and 
gradually, which is fast and accurate. 

6. CONCLUSION AND FUTURE WORK 

We present a hybrid recommender system for microblog 
to solve Track 1 task, KDD Cup 2012. The system ana- 
lyzes the synonyms of keywords and behaviors of different 
users, extracts their (potential)interests, finds the target cat- 
egories, grades the candidate items in those categories with 
indicators of popularity and similarity, and finally generates 
ordered item lists respect to each user. Experimental result 
shows high performance of our algorithm. The initialization 
of grading function's parameters needs improvement, since 
good choices of them accelerate the process and avoid local 
minimums. Dynamic algorithms which reduce the risk of 
inaccuracy by searching the best algorithm through compe- 
tition also deserves further study. 



